57 research outputs found

    Perceptual Grouping for Contour Extraction

    Get PDF
    This paper describes an algorithm that efficiently groups line segments into perceptually salient contours in complex images. A measure of affinity between pairs of lines is used to guide group formation and limit the branching factor of the contour search procedure. The extracted contours are ranked, and presented as a contour hierarchy. Our algorithm is able to extract salient contours in the presence of texture, clutter, and repetitive or ambiguous image structure. We show experimental results on a complex line-set. 1

    Efficient and Accurate Optimal Transport with Mirror Descent and Conjugate Gradients

    Full text link
    We design a novel algorithm for optimal transport by drawing from the entropic optimal transport, mirror descent and conjugate gradients literatures. Our scalable and GPU parallelizable algorithm is able to compute the Wasserstein distance with extreme precision, reaching relative error rates of 10810^{-8} without numerical stability issues. Empirically, the algorithm converges to high precision solutions more quickly in terms of wall-clock time than a variety of algorithms including log-domain stabilized Sinkhorn's Algorithm. We provide careful ablations with respect to algorithm and problem parameters, and present benchmarking over upsampled MNIST images, comparing to various recent algorithms over high-dimensional problems. The results suggest that our algorithm can be a useful addition to the practitioner's optimal transport toolkit

    Scoring Scene Symmetry

    Get PDF

    Measuring Symmetry in Real-World Scenes Using Derivatives of the Medial Axis Radius Function

    Get PDF
    Symmetry has been shown to be an important principle that guides the grouping of scene information. Previously, we have described a method for measuring the local, ribbon symmetry content of line-drawings of real-world scenes (Rezanejad, et al., MODVIS 2017), and we demonstrated that this information has important behavioral consequences (Wilder, et al., MODIVS 2017). Here, we describe a continuous, local version of the symmetry measure, that allows for both ribbon and taper symmetry to be captured. Our original method looked at the difference in the radius between successive maximal discs along a symmetric axis. The number of radii differences in a local region that exceeded a threshold, normalized by the number of total differences, was used as the symmetry score at an axis point. We now use the derivative of the radius function along the symmetric axis between two contours, which allows for a continuous method of estimating the score which does not need a threshold. By replacing the first derivative with a second derivative, we can generalize this method to allow pairs of contours which taper with respect to one another, to express high symmetry. Such situations arise, for example, when parallel lines in the 3D world project onto a 2D image. This generalization will allow us to determine the relative importance of taper and ribbon symmetries in natural scenes

    StepFormer: Self-supervised Step Discovery and Localization in Instructional Videos

    Full text link
    Instructional videos are an important resource to learn procedural tasks from human demonstrations. However, the instruction steps in such videos are typically short and sparse, with most of the video being irrelevant to the procedure. This motivates the need to temporally localize the instruction steps in such videos, i.e. the task called key-step localization. Traditional methods for key-step localization require video-level human annotations and thus do not scale to large datasets. In this work, we tackle the problem with no human supervision and introduce StepFormer, a self-supervised model that discovers and localizes instruction steps in a video. StepFormer is a transformer decoder that attends to the video with learnable queries, and produces a sequence of slots capturing the key-steps in the video. We train our system on a large dataset of instructional videos, using their automatically-generated subtitles as the only source of supervision. In particular, we supervise our system with a sequence of text narrations using an order-aware loss function that filters out irrelevant phrases. We show that our model outperforms all previous unsupervised and weakly-supervised approaches on step detection and localization by a large margin on three challenging benchmarks. Moreover, our model demonstrates an emergent property to solve zero-shot multi-step localization and outperforms all relevant baselines at this task.Comment: CVPR'2
    corecore